Fine-grained categorization can benefit from part-based features which revealsubtle visual differences between object categories. Handcrafted features havebeen widely used for part detection and classification. Although a recent trendseeks to learn such features automatically using powerful deep learning modelssuch as convolutional neural networks (CNN), their training and possibly alsotesting require manually provided annotations which are costly to obtain. Torelax these requirements, we assume in this study a general problem setting inwhich the raw images are only provided with object-level class labels for modeltraining with no other side information needed. Specifically, by extracting andinterpreting the hierarchical hidden layer features learned by a CNN, wepropose an elaborate CNN-based system for fine-grained categorization. Whenevaluated on the Caltech-UCSD Birds-200-2011, FGVC-Aircraft, Cars and Stanforddogs datasets under the setting that only object-level class labels are usedfor training and no other annotations are available for both training andtesting, our method achieves impressive performance that is superior orcomparable to the state of the art. Moreover, it sheds some light on ingenioususe of the hierarchical features learned by CNN which has wide applicabilitywell beyond the current fine-grained categorization task.
展开▼